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Abstract: We formally prove correct a C program that implements a numerical scheme for the 
resolution of the one-dimensional acoustic wave equation. Such an implementation introduces 
errors at several levels: the numerical scheme introduces method errors, and floating-point com- 
putations lead to round-off errors. We annotate this C program to specify both method error and 
round-off error. We use Frama-C to generate theorems that guarantee the soundness of the code. 
We discharge these theorems using SMT solvers, Gappa, and Coq. This involves a large Coq devel- 
opment to prove the adequacy of the C program to the numerical scheme and to bound errors. To 
our knowledge, this is the first time such a numerical analysis program is fully machine-checked. 
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Resolution numerique de 1 'equation des ondes : une preuve 
mecanisee complete d'un programme C 



Resume : Nous prouvons formcllement la correction d'un programme C implementant un 
schema numerique pour la resolution de F equation des ondes acoustiques en dimension 1. Une 
telle implementation introduit diffcrents types d'erreurs : l'erreur de methodc due au schema 
numerique et l'erreur d'arrondi due aux calculs en virgule flottante. Nous annotons ce programme 
C pour specifier ces deux types d'erreur. Nous utilisons Frama-C pour generer les theoremes qui 
garantissent la correction du code. Nous prouvons ces theoremes a l'aide de solveurs SMT, de 
Gappa et de Coq. Un developpement Coq important est necessaire pour prouver l'adequation du 
programme C au schema numerique et pour borner les erreurs. A notre connaissance, e'est la 
premiere fois qu'un tel programme d'analyse numerique est completement verifie mecaniquement. 

Mots-cles : preuve formellc d'un programme numerique, convergence d'un schema numerique, 
preuve de programme C, preuve formelle en Coq, equation des ondes acoustiques, equation aux 
derivees partielles, analyse d'erreur d'arrondi. 
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1 Introduction 

Ordinary differential equations (ODE) and partial differential equations (PDE) are ubiquitous in 
engineering and scientific computing. They show up in nuclear simulation, weather forecast, and 
more generally in numerical simulation, including block diagram modelization. Since solutions 
to nontrivial problems are non-analytic, they must be approximated by numerical schemes over 
discrete grids. 

Numerical analysis is a part of applied mathematics that is mainly interested in proving the 
convergence of these schemes |22j . that is, proving that approximation quality increases as the size 
of discretization steps decreases. The approximation quality represents the distance between the 
exact continuous solution and the approximated discrete solution; this distance must tend toward 
zero in order for the numerical scheme to be useful. 

A numerical scheme is typically proved to be convergent with pen and paper. This is a 
difficult, time-consuming, and error-prone task. Then the scheme is implemented as a C/CH — h or 
Fortran program. This introduces new issues. First, we must ensure that the program correctly 
implements the scheme and is immune from runtime errors such as out-of-bounds accesses or 
overflows. Second, the program introduces round-off errors due to floating-point computations and 
we must prove that those errors could not lead to irrelevant results. Typical pen-and-paper proofs 
do not address floating-point nor runtime errors. Indeed the huge number of proof obligations, and 
their complexity, make the whole process almost intractable. However, with the help of mechanized 
program verification, such a proof becomes feasible. In the first place, because automated theorem 
provers can alleviate the proof burden. More importantly, because the proof is guaranteed to cover 
all aspects of the verification. 

Our case study. We consider the acoustic wave equation in an one-dimensional space domain. 
The equation describes the propagation of pressure variations (or sound waves) in a fluid medium; 
it also models the behavior of a vibrating string. Among the wide variety of numerical schemes 
to approximate the ID acoustic wave equation, we choose the simplest one: the second order 
centered finite difference scheme, also known as three-point scheme. To keep it simple, we assume 
an homogeneous medium (the propagation velocity is constant) and we consider discretization 
over regular grids with constant discretization steps for time and space. Our goal is to prove the 
correctness of a C program implementing this scheme. 

Method and tools. We use the Jessie plug-in of Frama-C [HI [33J to perform the deductive 
verification of this C program. The source code is augmented with ACSL annotations [5] to 
describe its formal specification. When submitted to Frama-C, proof obligations are generated. 
Once these theorems are proved, the program is guaranteed to satisfy its specification and to 
be free from runtime errors. Part of the proof obligations are discharged by automated provers, 
e.g. Alt-Ergo [TO], CVC3 [5 , Gappa [25], and Z3 [35]. The more complicated ones, such as the 
one related to the convergence of the numerical scheme, cannot be proved automatically. These 
obligations were manually proved with the Coq [SJ [20] interactive proof assistant. In the end, we 
have formally verified all the properties of the C program. To our knowledge, this is the first time 
this kind of verification is machine- checked. The annotated C program and the Coq sources of the 
formal development are available from 

http : //f ost . saclay . inria. f r/wave_total_error .html 



State of the art. There is an abundant literature about the convergence of numerical schemes, 
e.g. see •">(>. [55] . In particular, the convergence of the three-point scheme for the wave equation 
is well-known and taught relatively early [7]. Unfortunately, no article goes into all the details 
needed for a formal proof. These mathematical "details" may have been skipped for readability, 
but some mandatory details may have also been omitted due to oversights. 

In the fields of automatic provers and proof assistants, few works have been done for the 
formalization and mechanical proofs of mathematical analysis, and even fewer works for numerical 
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analysis. The first developments on real numbers and real analysis are from the late 90's, in 
systems such as ACL2 [34 , Coq g5J, HOL Light [35], Isabelle [3J, Mizar [5J], and PVS [3D]. An 
extensive work has been done by Harrison regarding R™ and the dot product [37] . Constructive real 
analysis [35J [2H |3D] and further developments in numerical analysis [ID] [5DJ have been carried out 
at Nijmegen. We can also mention the formal proof of an automatic differentiation algorithm [46 . 

As explained by Rosinger in 1985, old methods to bound round-off errors were based on "un- 
realistic linearizing assumptions" |51j . Further work was done under more realistic assumptions 
about round-off errors [511 152) , but none of these assumptions were proved correct with respect 
to the numerical schemes. As Roy and Oberkampf, we believe that round-off errors should not be 
treated as random variables and that traditional statistical methods should not be used [53] . They 
propose the use of interval arithmetic or increased precision to control accuracy. Note that their 
example of hypersonic nozzle flow is converging so fast that round-off errors can be neglected [53]. 
Interval arithmetic can also take method error into account [55] . The final interval is then claimed 
to contain the exact solution. This is not formally proved, though. Additionally, the width of the 
final interval can be quite large. 

There are other tools to bound round-off errors not dedicated to numerical schemes. Some 
successful approaches are based on abstract interpretation [53J [3D]. In our case, they are of 
little help, since there is a complex phenomenon of error compensation during the computations. 
Ignoring this compensation would lead to bounds on round-off errors growing as fast as 0(2 k ) (k 
being the number of time steps). That is why we had to thoroughly study the propagation of 
round-off errors in this numerical scheme to get tighter bounds. It also means that most of the 
proofs had to be done by hand to achieve this part of the formal verification. 

Outline. Section [5] presents the PDE, the numerical scheme, and their mathematical properties. 
Section [3J is devoted to the proofs of the convergence of the numerical scheme and the upper 
bound for the round-off error. Finally, Section 0] describes the formalization, i.e. the tools used, 
the annotated C program, and the mechanized proofs. 

2 Numerical Scheme for the Wave Equation 

A partial differential equation (PDE) modeling an evolution problem is an equation involving 
partial derivatives of an unknown function of several independent space and time variables. The 
uniqueness of the solution is obtained by imposing initial conditions, i.e. values of the function 
and some of its derivatives at initial time. The problem of the vibrating string tied down at both 
ends, among many other physical problems, is formulated as an initial-boundary value problem 
where the boundary conditions are additional constraints set on the boundary of the supposedly 
bounded domain (56j . 

This section, as well as the steps taken at Section 13.11 to conduct the convergence proof of the 
numerical scheme, is inspired by [JJ. 

2.1 The Continuous Equation 

The chosen PDE models the propagation of waves along an ideal vibrating elastic string that is 
tied down at both ends, see [HUB], see a ^ so Fig ure Hl The PDE is obtained from Newton's laws 
of motion 05] . 

The gravity is neglected, so the string is supposed rectilinear when at rest. Let a: m i n and a; max be 
the abscissas of the extremities of the string. Let VL = [x m i n ,x max ] be the bounded space domain. 
Let p(x, t) be the transverse displacement of the point of the string of abscissa x at time t from 
its equilibrium position; it is a (signed) scalar. Let c be the constant propagation velocity; it is a 
positive number that depends on the section and density of the string. Let s(x, t) be the external 
action on the point of abscissa x at time t; it is a source term, such that t = =>■ s(x,t) = 0. 
Finally, let Po{x) and pi(x) be the initial position and velocity of the point of abscissa x. We 



INRIA 



Wave Equation Numerical Resolution: a Comprehensive Mechanized Proof of a C Program 



5 



2.5 



CD 



1.5 



0.5 



0.2 0.4 0.6 

length 



0.8 



Figure 1: Space-time representation of the signal propagating along a vibrating string. Each curve 
represents the string at a different time step. 

consider the initial-boundary value problem 



(1) 

(2) 

(3) 
(4) 



Vt >0,Vxe n, 

Vx G SI, 

Vx G fl, 
Vt > 0, 



q2 

(L(c)p)(x,t) = —(x,t) + A(c)p(x,t) = s(x,t), 

(L lP )(x,0)^^(x,0)= Pl (x), 

(L p)(x,0) d = p(x,0) =po(x), 
p(x min ,t) =p{x m3X ,t) = 



where the differential operator A(c) is defined by 



(5) 



A(c) = - . 



dx 2 ' 



This simple partial derivative equation happens to possess an analytical solution given by the 
so-called d'Alembert's formula [30], obtained from the method of characteristics and the image 
theory |38], Vt > 0, Vx G fl, 



(6) p(x,t) = -(p (x-ct) +p Q (x + ct)) + 



2c 



x-\-ct 



Pi(y)dy- 



l 

2c 



x-\-c(t— a) 



-c(t-a) 



%, °~)dy da 



where the quantities po, Pi, and s are respectively the successive antisymmetric extensions in space 
of P q, pi, and s defined on f2 to the entire real axis K. 

We have formally verified d'Alembert's formula as a separate work 42 . But for the purpose of 
the current work, we just admit that under reasonable conditions on the Cauchy datapo andpi and 
on the source term s, there exists a unique solution p to the initial-boundary value problem (dJ-Q 
for each c > 0. Simply supposing the existence of a solution instead of exhibiting it, opens the 
way to scale our approach to more general cases for which there is no known analytic expression 
of a solution, e.g. in the case of a nonuniform propagation velocity c. 
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For such a solution p, it is natural to associate at each time t the positive definite quadratic 
quantity 



(7) 



mm) 



def 



1 



dp 



+ l -\\( X ^p(x,t))\\\c) 



where (q,r) = f J^q(x)r(x)dx, \\q\\ 2 == f (q,q) and ||g||^( c ) = f (A(c) q, q) . The first term is inter- 
preted as the kinetic energy, and the second term as the potential energy, making E the mechanical 
energy of the vibrating string. 



2.2 The Discrete Equations 

Let j max be the positive number of intervals of the space discretization. Let the space discretization 
step Ax and the discretization function i^x be defined as 



A def ^max 
X = - 



and iAx(x) = f 



X x m 
Ax 



Let us consider the time interval [0,i max ]. Let At £]0,i max [ be the time discretization step. 
We define 



kAt(t) 



def 



t 

At 



and fc n 



def 



Now, the compact domain O x [0, i max ] is approximated by the regular discrete grid defined by 
(8) Vfc e [0..fc max ], Vz e [0.i max ], d = { Xi , t k ) d = (x min + tAx, kAt). 

For a function q defined over SI x [0,t max ] (resp. f2), the notation q^ denotes any dis- 
crete approximation of q at the points of the grid, i.e. a discrete function over [0..i max ] x 
[0..fc max ] (resp. [0..i max ]). By extension, the notation q^ is also a shortcut to denote the ma- 
trix (gk)o<»<w,o<fc 

approximation defined on [0..i max ] x [0..fc max ] by 

k def , k 



; <fe max (resp. the vector (<7i)o<i<i max )- The notation q^ is reserved to the 



def 



q(xJ ) (resp. qi = q(xi)) 



t i 

t k+i 

t k 



k-l 



x j-l x j x j+l X 

Figure 2: Three-point scheme: p k+1 (at x) depends on p k _ 1 , p k , p k + \, and p k ~ x (at •). 



Let poh and pih be two discrete functions over [0.i max ]. Let Sh be a discrete function over 
[0..i max ] x [0..fc max ]. Then, the discrete function ph over [0..i max ] x [0..fc max ] is said to be the 
solution of the three-poini0 finite difference scheme, as illustrated in Figure [5J when the following 
set of equations holds: 

(9) Vfc e [2.. 

" nmx 1, V? g fl. 

■''max J- J i 

1 In the sense "three spatial points", for the definition of matrix A^(c). 
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(10) 



Vi G [l..t„ 



(iih(c) Ph). 



de f pi_P? + At (A(c)( .^ p o ))j 



Pi, 



(11) Vi € [l.imax " 

(12) vfe e [o..fc n 



1], (^OhPh) l =PO,: 



where the matrix A^(c), a discrete analog of A(c), is defined for any vector gh, by 
(13) Vz e [l..w - 1], {A h (c) q h ) i =' - c- 

A discrete analog of the energy is also defined bjH 



Ax 2 



(14) 



£h(c)(p h ) 



def 1 

" 2 



P? + 1 -rf 

At 



where, for any vectors % and r^, 



(9h,rh) Aa 



def 



(gh,r h ) Ah(c) d = (Ah(c)gh,rh) A x: 



Ax ' 



Note that the three-point scheme is parameterized by the discrete Cauchy data poh and pi\ l7 
and by the discrete source term Sh- Of course, when these discrete inputs are respectively approx- 
imations of the continuous functions po, p%, and s (e.g. when poh = poh, Pih — Pih, and Sh = Sh)> 
then the discrete solution p^ is an approximation of the continuous solution p. 

2.3 Convergence 

Let £ be in ]0,1[. The CFL(£) condition (for Courant-Friedrichs-Lewy, see [35]) states that the 
discretization steps satisfy the relation 



(15) 



The convergence error eh measures the distance between the continuous and discrete solutions. 
It is defined by 



(16) 



V/fc G [0..k max ], Vi G [0.. 



def _u u 

= Pi-pt 



Note that when poh = Poh, then for all i, e° = 0. The truncation error Sh measures at which 
precision the continuous solution satisfies the numerical scheme. It is defined for k G [2..fc max ] and 
i G [L.imax - 1] by 



(17) 
(18) 
(19) 



= f (L h (c)p h )1 



-k-l 



def 



e\ = (iih(c)ph)» ~Pi,i, 
.0 



(LohPh)i - Po, 



Again, note that when p h = Poh and pn, = pih, then for all i, e° = and e| = e\j At. 
Furthermore, when there is also Sh = Shj then the convergence error eh is itself solution of the 
same numerical scheme with inputs defined by, for all i, k, 

e l 

Po,i=£° = 0, p lti = el = ands?=£? +1 . 



2 By convention, the energy is denned between steps k and fe + 1, hence the notation k + i. 
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The numerical scheme is said to be convergent of order 2 if the convergence error tends toward 
zero at least as fast as Ax 2 + At 2 when both discretization steps tend toward zeroU More precisely, 
the numerical scheme is said to be convergent of order (m,n) uniformly on the interval [0, t max ] if 
the convergence error satisfies!! 



(20) 



(i i ^ e-' AtW ) \\ Ac = O [0 , tmax] (Ax m + At n ). 



The numerical scheme is said to be consistent with the continuous problem at order 2 if the 
truncation error tends toward zero at least as fast as Ax 2 + At 2 when the discretization steps 
tend toward 0. More precisely, the numerical scheme is said to be consistent with the continuous 
problem at order (m, n) uniformly on interval [0, i max ] if the truncation error satisfies 



(21) 



O 



Ax 



[0,t„ 



At 7 



The numerical scheme is said to be stable if the discrete solution of the associated homogeneous 
problem (i.e. without any source term, s(x,t) = 0) is bounded independently of the discretization 
steps. More precisely, the numerical scheme is said to be stable uniformly on interval [0, i max ] if 
the discrete solution of the problem without any source term satisfies 



(22) 3a,Ci,C 2 >0, Vt € [0,i max ], VAa:,At>0, V^x 2 + At 2 < a 



(i -> P^ t{t) ) \\ Ax < (Ci + <?2*)(|boh|| A , + l|PohlU h(c) + IbihlUJ- 



The result to be formally proved at Section 13.11 states that if the continuous solution p is 
regular enough on f2 x [0,t max ] and if the discretization steps satisfy the CFL(£) condition, then 
the three-point scheme is convergent of order (2, 2) uniformly on interval [0,i max ]. 

We do not admit (nor prove) the Lax equivalence theorem which stipulates that for a wide 
variety of problems and numerical schemes, consistency implies the equivalence between stability 
and convergence. Instead, we establish that consistency and stability implies convergence in the 
particular case of the one-dimensional acoustic wave equation. 



2.4 Program 

The main part of the C program is listed in Listing [TJ 

The grid steps Ax and At are respectively represented by the variables dx and dt, the grid sizes 
*m ax and /c max by the variables ni and nk, and the propagation velocity c by the variable v. The 
initial position poh is represented by the function pO. The initial velocity p\\ x and the source term 
Sh are supposed to be zero and are not represented. The discrete solution p^ is represented by the 
two-dimensional array p of size (i max + 1) x (fc max + 1). (This is a simple naive implementation, 
a more efficient implementation would store only two time steps.) 

To assemble the stiffness matrix (c) , we only have to compute the square of the CFL coeffi- 
cient ^ (lines 1-2). Then, we recognize the space loops for the initial conditions: Equation (jlip 
on lines 6-8, and Equation (fTU|) with pu, = on lines 14-17. The space-time loop on lines 23- 
28 for the evolution problem comes from Equation ((5]). And finally, the boundary conditions of 
Equation (fT2"|) are spread out on lines 9-10, 18-19, and 29-30. 



3 Bounding Errors 
3.1 Method Error 

We first present the notions necessary to formalize and prove the method error. Then, we detail 
how the proof is conducted: we establish the consistency, the stability and prove that these two 
properties imply convergence in the case of the one-dimensional acoustic wave equation. 

3 Note that Ax tending toward actually means that i max goes to infinity. 
4 See Section 13.1.1 1 for the precise definition of the big O notation. 
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Listing 1: The main part of the C code, without annotations. 

/* Compute the constant coefficient of the stiffness matrix. */ 
al = dt/ dx*v ; 
a = al*al ; 

/* First initial condition and boundary conditions . */ 
/* Left boundary. */ 
p[0][0] = 0.; 

/* Time iteration —1 = space loop. */ 
for (i=l; i<ni ; i++) { 
p[i][0] = p0(i*dx); 

} 

/* Right boundary . */ 
p[ni][0] = 0.; 

/* Second initial condition (with pl~0) and boundary conditions . */ 
/* Left boundary . */ 
p[0][l] = 0.; 

/* Time iteration = space loop. */ 
for (i=l; i<ni ; i++) { 

dp = p[i+l][0] - 2.*p[i][0] + p[i -1][0]; 

p[ i ] [1] = p[ i ] [0] + 0.5*a*dp; 

} 

/* Right boundary. */ 
P[ni][l] = 0.; 

/* Evolution problem and boundary conditions . */ 
/* Propagation — time loop. */ 
for (k=l; k<nk; k++) { 

/* Left boundary . */ 

p[0][k+l] = 0.; 

/* Time iteration k = space loop. */ 
for ( i=l; i<ni ; i++) { 

dp = p[i+l][k] - 2.*p[i][k] + p[i-l][k]; 

p[i][k+l] = 2.*p[i][k] - p[i][k-l] + a*dp; 

} 

/* Right boundary . */ 
p[ni][k+l] = 0.; 

} 
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3.1.1 Big O, Differentiability, and Regularity 

When considering a big O equality a — 0(b), one usually assumes that a and b are two expressions 
defined over the same domain and its interpretation as a quantified formula comes naturally. Here 
the situation is a bit more complicated. Consider 

/(x, Ax) = 0( 9 (Ax)) 

when || Ax|j goes to 0. If one were to assume that the equality holds for any x e R 2 , one would 
interpret it as 

Vx,3a > 0,3C> 0,VAx, ||Ax|| < a ^> |/(x,Ax)| <C-|g(Ax)|, 

which means that constants a and C are in fact functions of x. Such an interpretation happens 
to be useless, since the inhmum of a may well be zero while the supremum of C may be +oo. 

A proper interpretation requires the introduction of a uniform big O relation with respect to 
the additional variable x: 

(23) 3a > 0, 3C > 0, Vx e O x ,VAx e Q Ax ,||Ax|| < a => |/(x,Ax)| < C ■ \g(Ax)\. 

To emphasize the dependency on the two subsets f2 x and f^Ax, uniform big O equalities are 
now written 

/(x,Ax) = On xlfW (<?(Ax)). 

We now precisely define the notion of "sufficiently regular" functions in terms of the full-fledged 
notation for the big O. The further result on the convergence of the numerical scheme requires 
that the solution of the continuous equation is actually sufficiently regular. We introduce two 
operators that, given a real-valued function / defined on the 2D plane and a point in the plane, 
return the values ^ and ^ at this point. Given these two operators, we can define the usual 2D 
Taylor polynomial of order n of a function /: 

TP„(/, x )«(Az,A<)~ ±1 (± (») .^£_ W .*-. M-A . 

p=0 r \ m= o \ / / 

Let f2 x C R 2 . We say that the previous Taylor polynomial is a uniform approximation of 
order n of / on f2 x when the following uniform big O equality holds: 



/(x+ Ax) - TP„(/,x)(Ax) = 0x , R2 (|| Ax 



\n+l\ 



A function / is then said to be sufficiently regular of order n uniformly on f2 x when all its 
Taylor polynomials of order smaller than n are uniform approximations of / on f2 x . 

3.1.2 Consistency 

The consistency of a numerical scheme expresses that, for Ax small enough, the continuous 
solution taken at the points of the grid almost solves the numerical scheme. More precisely, 
we formally prove that when the continuous solution of the wave equation ((T])-(EI1) is sufficiently 
regular of order 4 uniformly on [x m i n , x max ] x [0, t max ], the numerical scheme (|5j)- (fT2")) is consistent 
with the continuous problem at order (2, 2) uniformly on interval [0,£ max ] (see definition (|21[) 
in Section |2.3|) . This is obtained using the properties of Taylor approximations; the proof is 
straightforward while involving long and complex expressions. 

The key idea is to always manipulate uniform Taylor approximations that will be valid for all 
points of all grids when the discretization steps goes down to zero. 

For instance, to take into account the initialization phase corresponding to Equation (|10j) . we 
have to derive a uniform Taylor approximation of order 1 for the following continuous function 
(for any v sufficiently regular of order 3) 

. . v(x, t + At) — v(x, t) At 2 v(x + Ax, t) ~ 2v(x, t) + v(x — Aa;, t) 
((x,t),(Ax,At)) » -c — . 
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Note that the expression of this function involves both x + Ax and x — Ax, meaning that we 
need a Taylor approximation which is valid for both positive and negative growths. The proof 
would have been impossible if we had required < Ax (as a space grid step) in the definition of 
the Taylor approximation. 

In contrast with the case of an infinite string [13] . we do not need here a lower bound for Ct^. 



3.1.3 Stability 

The stability of a numerical scheme expresses that the growth of the discrete solution is somehow 
bounded in terms of the input data (here, the Cauchy data uoh and Uih, and the source term s n ). 
For the proof of the round-off error (see Section 13.211 , we need a statement of the same form as 
definition (|22|) of Section [231 Therefore, we formally prove that, under the CFL(£) condition (|T5|). 
the numerical scheme (!§))- (IT2"T) is stable uniformly on interval [0,i max ]. 

But, as we choose to prove the convergence of the numerical scheme by using an energetic 
techniqu^l, it is more convenient to formulate the stability in terms of the discrete energy. More 
precisely, we also formally prove that under the CFL(£) condition (|15l) . the discrete energy (|14l) 
satisfies the following overestimation, 



^ E h (c)( Ph ) k + L 2 < ^ h (c)(ph)3 + 



V2 



1 1-> s. 



k' = l 



for all t € [0, imax] and with k = I ^jj — 1. 

The evolution of the discrete energy between two consecutive time steps is shown to be pro- 
portional to the source term. In particular, the energy is constant when the source is inactive. 
Then, we obtain the following underestimation of the discrete energy, 



Vfc, - 1 



AT 

'Ax . 



„fc+i 



Pi 



At 



< E h (c)(p h 



Therefore, the non-negativity of the discrete energy is directly related to the CFL(£) condition. 
Note that this stability result is valid for any input data poh, Pit, and Sh- 



3.1.4 Convergence 

The convergence of a numerical scheme expresses the fact that the discrete solution gets closer to 
the continuous solution as the discretization steps go down to zero. More precisely, we formally 
prove that when the continuous solution of the wave equation (J])-([3]) is sufficiently regular of 
order 4 uniformly on [x m i n ,x max ] x [0, t roax ], and under the CFL(£) condition (|15[) . the numerical 
scheme ([£))) - (Tj"2"]) is convergent of order (2, 2) uniformly on interval [0, t max ] (see definition (|2T)|) in 
Section l2~3l). 

Firstly, we prove that the convergence error eh is itself the discrete solution of a numerical 
scheme of the same form but with different input dat£^| In particular, the source term (on the 
right-hand side) is here the truncation error £h associated with the initial numerical scheme for 
Ph- Then, the previous stability result holds, and we have an overestimation of the square root 
of the discrete energy associated with the convergence error E\ l {c){e\ l ) that involves a sum of the 
corresponding source terms, i.e. the truncation error. Finally, the consistency result also makes 
this sum a big O of Ax 2 + At 2 , and a few more technical steps conclude the proof. 



3.2 Round-off Error 

As each operation is done with IEEE-754 floating-point numbers [17] > round-off errors will occur 
and may endanger the accuracy of the final results. On this program, naive forward error analysis 

5 The popular alternative, using the Fourier transform, would have required huge additional Coq developments. 
6 Of course, there is no associated continuous problem. 
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gives an error bound that is proportional to 2 fe 2 -53 for the computation of a p\ . If this bound was 
sensible, it would cause the numerical scheme to compute only noise after a few steps. Fortunately, 
round-off error actually compensate themselves. To take into account the compensations and hence 
prove a usable error bound, we need a precise statement of the round-off error [12) to exhibit the 
cancellations made by the numerical scheme. 

3.2.1 Local Round-off Errors 

Let S k be the (signed) floating-point error made in the two lines computing p k (lines 26-27 in List- 
ing H}. Floating-point values as computed by the program will be underlined: a, p k to distinguish 
them from the discrete values of previous sections. They match the expressions a and p[i][k] in the 
annotations, while a and p k can be represented in the annotations by \exact(a) and \exact(p[i][k]), 
as described in Section f4. 1.41 
The 5 k are defined as follow: 

S k+i = £ fe+i _ _ E k-i + a x ^ _ 2g * + E k_jy 
Note that the program explained in Section [2^41 gives us that 

£- +i = fl (%--2r i +sx(^i- 2 ^+^ 1 )) 

where fl(-) means that all the arithmetic operations that appear between the parentheses are 
actually performed by floating-point arithmetic, hence a bit off. 

In order to get a bound on 5 k , we need to have the range of p k . For this bound to be usable 
in our correctness proof, we need the range to be [—2, 2]. We have proved this fact by using the 
bounds on the method error and the round-off error of all the p k and p l . 

To prove the bound on 8 k , we perform forward error analysis and then use interval arithmetic 
to bound each intermediate error. We prove that, for all i and k, we have \6 k \ < 78 x 2~ 52 for a 
reasonable error bound for a, that is to say \a — a\ < 2 -49 . 

3.2.2 Convolution of Round-off Errors 

Note that the global floating-point error A k — p k — p k depends not only on 5 k , but also on 
all the 5 k ^!j for < I < k and —l<j<l. Indeed round-off errors propagate along floating- 
point computations. Their contributions to A^, which are independent and linear (due to the 
structure of the numerical scheme), can be computed by performing a convolution with a function 
A : (Z x Z) -> I. This function A represents the results of the numerical scheme when fed with a 
single unit value: 

Ag = 1 Vi ^ 0, A? = 
X 1 _ 1 =X\ = a A* =2(1 -a) Vi ^ {-1,0, 1}, A, 1 = 
A^ = a x (A^ 1 + A*-/) + 2(1 - a) x A^ 1 - \ k ~ 2 

Theorem 1. 

1=0 j = -l 

Details of the proof can be found in j!2) , but this point of view using convolution is new. The 
proof mainly amounts to performing numerous tedious transformations of summations until both 
sides are proved equal. 

The previous proof assumes that the double summation is correct for all (i', k') such that 
k 1 < k. This would be correct if there was an unbounded set of i where p k is computed. This is 
no longer the case for a finite string. Indeed, at the ends of the range (i = or i max ), p k and p k 
are equal to 0, so A k has to be too. 
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The solution is to define the successive antisymmetric extension in space (as is done for 
d'Alembert's formula in Section 12. ip and to use it instead of 6. This ensures that both Ag 
and A^ are equal to 0. It does not have any consequence on the values of for < i < i max . 

3.2.3 Bound on the Global Round-off Error 

The analytic expression of Af can be used to obtain a bound on the round-off error. We will need 
two lemmas for this purpose. 



Lemma 1. V" Af = k + 1. 



Proof. We have 

+oo +oo +oo +oo +oo +oo 

i— — oo i— — oo i— — oo z— — oo — oo i— — oo 

The sum by line verifies a simple linear recurrence. As — 1 an d ^ Aj = 2, we have 

VAf /■• • 1.' ^ ^ □ 

Lemma 2. A* > 0. 

Proof. The demonstration was found out by M. Kauers and V. Pillwein. 
If we denote by P the Jacobi polynomial, we have 

* = 1 (A) ("i +, <-» ,+v - s - 2 °> 

Now the conjecture follows directly from the inequality of Askey and Gasper [3], which asserts 
that ELo p k r, °\ x ) > for r > -1 and -1< x < 1 (see Theorem 7.4.2 in The Red Book 0). □ 



Theorem 2. 



A? = 



P- - Pi 



< 78 x 2~ 53 x (k + 1) x (fc + 2). 



Proof. According to Theorem [I] A**' is equal to 53;=o Ej=-i ^i+J - We know that for all j and 
|<5j | < 78 x 2~ 52 and that ^ =2 + 1. Since the Af are nonnegative, the error is easily bounded 
by 78 x 2- 52 x £^(1 + 1). D 

3.3 Total Error 

Let £h be the total error. It is the sum of the method error (or convergence error) eh of Sections 12. 31 
and 13.1.41 and of the round-off error A>, of Section 13.21 

From Theorem [2j we can estimate^ the following upper bound for the spatial norm of the 
round-off error when Aa; < 1 and At < t max /2: for all t £ [0,t max ], 



i ^ A, feAt(t) 



= x E(a^«) 2 a, 

\ i=0 

< v/(w + l) Ax x 78 x 2- 53 x + l) x + : 

< V^max - X mi „ + 1 X 78 X 2~ 53 X 3 X 



7 W hen ^ > 2, we have ( W + l) + 2) < 3%p 
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Thus, from the triangular inequality for the spatial norm, we obtain the following estimation 
of the total error: 



Vt 



G [0,t max ], VAx, || Ax|| <min(a e ,a A )^ || (*«->■ £- AtW ) || < C e (Ax 2 + At 2 ) + ^| 



where the convergence constants a e and C e were extracted from the Coq proof (see Section 13.1.41) 
and are given in terms of the constants for the Taylor approximation of the exact solution at 
degree 3 (03 and C3), and at degree 4 (0:4 and C4) by 



a e = min(l,t max ,a 3 ,a 4 ), 



a 



Ce 2/i^ max \/x max ^min ( ^^ max 

with ;u = ^ 2 , C" = max(l,C 3 + c 2 C 4 + 1), and C" = max(C",2(l + c 2 )C 4 ), and where the 
round-off constants oa and Ca, as explained above, are given by 

OA = min(l,t max /2), 
Ca = 234 x 2 53 x t 2 iax V imai 




Time step logl 0<dt> Space slep loglO(d^) Spacestepdx 



Figure 3: Upper bound for the total error in log-scale. Left: for Ax and At satisfying the 
CFL condition. The lighter area (in yellow) represents the higher values above 10 4 , whereas the 
darker area represents the lower values below 10 _1 . Right: for an optimal CFL condition with 
At — — -Ax. The green crosses represent the effective total error computed by the C program 
for a few values of the space step. 

To give an idea of the relative importance of both errors, we consider the academic case where 
the space domain is the interval [0,1], the velocity of waves is c = 1, and there is no initial 
velocity (u\{x) = 0) nor source term (s(x,t) = 0). We suppose that the initial position is given 
by Uo(x) = x(2{ x ~ x o)/l) where xq — 0.5, I — 0.25, and x is the C 4 function defined on [—1, 1] 
by x( z ) = (cos( : | z)) 5 , and with null continuation on the real axis. For this function, we may take 
Q3 = a 4 = \/2/2, C3 = 5120"\/2, and C4 = 409600/3. The corresponding solution presents two 
hump-shaped signals that propagate in each direction along the string, see Figure [T] 

The upper bound on the total error is represented in Figure |3] Note that everything is in 
logarithmic scale. Of course, decreasing the size of the grid step decreases the method error, but 
in the same time, it increases the round-off error. Hence, the existence of a minimum for the upper 
bound on the total error (about 0.02 in our test case), corresponding to optimal grid step sizes. 
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Fortunately, the effective total error usually happens to be much smaller than this upper bound 
(by about a factor of 10 6 in our example). 

Even if the effective total error on this example is off by several orders of magnitude with 
respect to the theoretical bound, this experiment is still reassuring. First, the left side of Figure [3] 
shows that the optimal choice (the darker part) for choosing Ax and At is reached near the 
limit of the CFL condition. This property matches common knowledge from numerical analysis. 
Second, the right side shows that both the effective error and the theoretical error have the same 
asymptotic behavior. So the properties we have verified in this work are not intrinsically easier 
than the best theorems one could state. It is just that the constants of the formulas extracted 
from the proofs (which we did not tune for this specific purpose) are not optimal for this example. 

4 Mechanization of Proofs 

In Sections l3.ll and l3.21 we have mostly described the method and round-off errors introduced when 
solving the wave equation problem with the given numerical scheme. We do not yet know whether 
this formalization actually matches the program described in Section 12.41 and fully given in Ap- 
pendix [A] In addition, the program might contain programming errors like out-of-bound accesses, 
which would possibly be left unattended while comparing the program and its formalization. 

To fully verify the program, our process is as follows. First, we annotated the C program with 
comments specifying its behavioral properties, that is, what the program is supposed to compute. 
Second, we let Frama-C/Why generate proof obligations that state that the program matches its 
specification and that its execution is safe. Third, we used automated provers and Coq to prove 
all of these obligations. 

Section I4TT1 presents all the tools we have used for verifying the C program. Then Section |4~21 
explains how the program was annotated. Finally, Section 14.31 shows how we proved all the 
obligations, either automatically or with a proof assistant. 

4.1 Tools 

Several software packages are used in this work. The formal proof of the method error has been 
made in Coq. The formal proof of the round-off error has been made in Coq, and using the Gappa 
tactic. The certification of the C program has used Frama-C (with the Jessie plug- in), and to 
prove the produced goals, we used Gappa, SMT provers, and the preceding Coq proofs. This 
section is devoted to present these tools and necessary libraries. 

4.1.1 Coq 

CocE is a formal system that provides an expressive language to write mathematical definitions, ex- 
ecutable algorithms, and theorems, together with an interactive environment for proving them [5]. 
Coq's formal language is based on the Calculus of Inductive Constructions [21] that combines 
both a higher-order logic and a richly-typed functional programming language. Coq allows to 
define functions or predicates, that can be evaluated efficiently, to state mathematical theorems 
and software specifications, and to interactively develop formal proofs of these theorems. These 
proofs are machine- checked by a relatively small kernel, and certified programs can be extracted 
from them to external programming languages like Objective Caml, Haskell, or Scheme |43j . 

As a proof development system, Coq provides interactive proof methods, decision and semi- 
decision algorithms, and a tactic language for letting the user define its own proof methods. 
Connection with external computer algebra system or theorem provers is also available. 

The Coq library is structured into two parts: the initial library, which contains elementary 
logical notions and data-types, and the standard library, a general-purpose library containing 
various developments and axiomatizations about sets, lists, sorting, arithmetic, real numbers, etc. 

6 http : //coq. inria. f r/ 
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In this work, we mainly use the Reals standard library [45] . that is a classical axiomatization 
of an Archimedean ordered complete field. We chose Reals to make our numerical proofs because 
we do not need an intuitionistic formalization. 

For floating-point numbers, we use a large Coq librarjjf] initially developed in [55] and extended 
with various results afterwards [11]. It is a high-level formalization of IEEE-754 with gradual 
underflow. This is expressed by a formalization where floating-point numbers are pairs (n, e) 
associated with real values n x /3 e . The requirements for a number to be in the format (e m i n ,/3 p ) 
are 

|n| < fi p and e m i n < e. 

This formalization is convenient for human interactive proofs as testified by the numerous 
proofs using it. The huge number of lemmas available in the library (about 1400) makes it 
suitable for a large range of applications. This library has since then been superseded by the 
Flocq library [IB], but it was not yet available at the time we proved the floating-point results of 
this work. 

4.1.2 Frama-C, Jessie, Why, and the SMT Solvers 

We use the Frama-C platforrrF"! to perform formal verification of C programs at the source- 
code level. Frama-C is an extensible framework that combines static analyzers for C programs, 
written as plug-ins, within a single tool. In this work, we use the Jessie plug-in for deductive 
verification. C programs are annotated with behavioral contracts written using the ANSI C 
Specification Language (ACSL for short) [5J. The Jessie plug-in translates them to the Jessie 
language [S] , which is part of the Why verification platform [3T]. This part of the process is 
responsible for translating the semantics of C into a set of Why logical definitions (to model C 
types, memory heap, etc.) and Why programs (to model C programs). Finally, the Why platform 
computes verification conditions from these programs, using traditional techniques of weakest 
preconditions, and emits them to a wide set of existing theorem provers, ranging from interactive 
proof assistants to automated theorem provers. In this work, we use the Coq proof assistant 
( Section liTTj) . SMT solvers Alt-Ergo [19 , CVC3 [SJ and Z3 [35], and the automated theorem 
prover Gappa fSection |4.1.3[) . Details about automated and interactive proofs can be found in 
Section l4~3l The dataflow from C source code to theorem provers can be depicted as follows: 

Coq 



ACSL- annotated 
C program 



Gappa 



More precisely, to run the tools on a C program, we use a graphical interface called gWhy. A 
screenshot is in Appendix [B] In this interface, we may call one prover on one or on many goals. 
We then get a graphical view of how many goals are proved and by which prover. 

In ACSL, annotations are using first-order logic. Following the programming by contract ap- 
proach, the specifications involve preconditions, postconditions, and loop invariants. Contrary to 
other approaches focusing on run-time assertion checking, ACSL specifications do not refer to C 
values and functions, even if pure, but refer instead to purely logical symbols. In the following 
contract for a function computing the square of an integer x 

"http : //lipf orge . ens-lyon . f r/www/pf f / 
1C http : //www. frama- c.cea.fr/] 



Frama-C 
(Jessie plug-in) 
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//<& ensures \result = x * x; 
int square(int x); 

the postcondition, introduced with ensures, refers to the return value \result and argument x. Both 
are denoting mathematical integer values, for the corresponding C values of type int. In particular, 
x * x cannot overflow. Of course, one could give function square a more involved specification that 
handles overflows, e.g. with a precondition requiring x to be small enough. Simply speaking, we 
can say that C integers are reflected within specifications as mathematical integers, in an obvious 
way. The translation of floating-point numbers is more subtle and explained in Section 14.1.41 

4.1.3 Gappa 

The Gappa too0 adapts the interval- arithmetic paradigm to the proof of properties that occur 
when verifying numerical applications |25j . The inputs are logical formulas quantified over real 
numbers whose atoms are usually enclosures of arithmetic expressions inside numeric intervals. 
Gappa answers whether it succeeded in verifying it. In order to support program verification, 
one can use rounding functions inside expressions. These unary operators take a real number 
and return the closest real number in a given direction that is representable in a given binary 
floating-point format. For instance, assuming that operator o rounds to the nearest binary64 
floating-point number, the following formula states that the relative error of the floating-point 
addition is bounded: 

Vx, y £ R, 3e e R, |e| < 2~ 53 A 0(0(2) + o(y)) = (o(x) + o(y)) x (1 + e). 

Converting straight-line numerical programs to Gappa logical formulas is easy and the user can 
provide additional hints if the tool were to fail to verify a property. The tool is specially designed to 
handle codes that are performing convoluted manipulations. For instance, it has been successfully 
used to verify a state-of-the-art library of correctly-rounded elementary functions [27] , In the 
current work, Gappa has been used to check much simpler properties. (In particular, no user hint 
was needed to discharge a proof automatically.) But the length of their proofs would discourage 
even the most dedicated users if they were to be manually handled. One of the properties is the 
round-off error of a local evaluation of the numerical scheme (Section I3.2.ip . Other properties 
mainly deal with proving that no exceptional behavior occurs while executing the program: due 
to the initial values, all the computed values are sufficiently small to never cause overflow. 

The verification of some formulas requires reasonings that are so long and intricate |27j , that it 
might cast some doubts on whether an automatic tool actually succeeded in proving them. This 
is especially true when the tool ends up proving a property stronger than what the user expected. 
That is why Gappa also generates a formal certificate that can be mechanically checked by a proof 
assistant. This feature has served as the basis for a Coq tactic for automatically solving goals 
related to floating-point and real arithmetic [15] . The tactic reads the current Coq goal, generates 
a Gappa goal, executes Gappa on it, recovers the certificate, and converts it to a complete proof 
term that Coq matches against the current goal. At this point, whether Gappa is correct or not 
no longer matters: the original Coq goal is formally proved by a complete Coq proof. 

This tactic has been used whenever a verification condition would have been directly proved 
by Gappa, if not for some confusing notations or encodings of matrix elements. We just had to 
apply a few basic Coq tactics to put the goal into the proper form and then call the Gappa tactic 
to discharge it automatically. 

4.1.4 Floating-Point Formalizations 

A natural question is the link between the various representations of floating-point numbers. 
We assume that the execution environment (mostly the processor) complies with the IEEE-754 
standard 47;, which defines formats, rounding modes, and operations. The C program we consider 
is compiled in an assembly code that will directly use these formats and operations. We also 

n http : //gappa. gf orge . inria.fr/ 
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assume that the compiler optimizations preserve the visible semantics ol floating-operations from 
the original code, e.g. no use of the extended registers. Such optimizations could have been taken 
into account though, but at a cost [T7] . 

When verifying the C program, the floating-point operations are translated by Frama-C/- 
Jessie/Why following some previous work by two of the authors [14]. A floating-point number / is 
modeled in the logic as a triple of real numbers (r, e, to). Value r simply stands for the real number 
that is immediately represented by /; value e stands for the exact value of /, as obtained if no 
rounding errors had occurred; finally, value to stands for the model of /, which is a placeholder 
for the value intended to be computed and filled by the user. The two latter values have no 
existence in the program, but are useful for the specification and the verification. In particular, 
they help state assertions about the rounding or the model error of a program. In ACSL, the 
three components of the model of a floating-point number f can be referred to using f, \exact(f), 
and \model(f), respectively. \round_error(f) is a macro for the rounding error, that is, \abs(f - 
\exact(f)). 

For instance, the following excerpt from our C program specifies the error on the content of 
the dx variable, which represents the grid step Ax (see Section [2]). 



dx = 1 . / n i ; 




/*@ assert 




@ dx > 0. <&& dx <= 


0.5 Mi 


@ \abs(\exact (dx) - 


- dx) / dx <= 0x1. p- 53; 


@ */ 





Note that Oxl.p-53 is a valid ACSL (and C too) literal meaning 2~ 53 . 

Proof obligations are extracted from the annotated C program by computing weakest precondi- 
tions and then translated to automated and interactive provers. For SMT provers, the three fields 
r, e, and to, of floating-point numbers are expressed as real numbers and operations on floating- 
point numbers are uninterpreted relations axiomatized with basic properties such as bounds on the 
rounding error or monotonicity. For Gappa too, the fields are seen as real numbers. The tool, how- 
ever, knows about floating-point arithmetic and its relation to real arithmetic. So floating-point 
operations are translated to the corresponding symbols from Gappa. 

For Coq, we use the formalization described in Section 14.1.11 with a limited precision and 
gradual underflow (so that subnormal numbers are correctly translated). It is based on the real 
numbers of the standard library, which are also used for the translation of the exact and the model 
parts of the floating-point number. 

While the IEEE-754 standard defines infinities and Not-a-Number as floating-point values, our 
translation does not take them into account. This does not compromise the correctness of the 
translation though, as each operation has a precondition that raises a proof obligation to guarantee 
that no exceptional events occur, such as overflow or division by zero, and therefore no infinities 
nor Not-a-Number are produced by the program. 

To summarize, there is one assumption about the actual arithmetic being executed (IEEE-754 
compliant and no overly aggressive optimizations from the compiler) and three formalizations of 
floating-point arithmetic used to verify the program: one used by Jessie/ Why and then sent to 
the SMT solvers, one used by Gappa, and one used by Coq. The combination of these three 
different formalizations does not introduce any inconsistency. Indeed, we have formally proved in 
Coq that Gappa's and Coq's formalizations are equivalent for floating-point formats with limited 
precision and gradual underflow, that is, IEEE-754 formats. We have also formally proved that 
the Jessie/ Why specifications and the properties for SMT provers are compatible with these for- 
malizations, including the absence of special values (infinity or Not-a-Number) and the possibility 
to disregard the upper bound on reals representing floating-point numbers. 

In fact, there is a fourth formalization of floating-point arithmetic involved, which is the one 
used internally by the interval computations of Gappa for proving results about real-valued ex- 
pressions. It is not equivalent to the previous ones, since it is a multi-precision arithmetic, but 
it has no influence whatsoever on the formalization that Gappa uses for modeling floating-point 
properties. 
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4.2 Program Annotations 

The full annotations are given in Appendix [X] We give here hints about how to specify this 
program. 

There are two axiomatics. The first one corresponds to the mathematics: the exact solution 
of the wave equation and its properties. It defines the needed values (the exact solution p, and its 
initialization po). We here assume that s and p\ are zero functions. It also defines the derivatives 
of p {psol\, first derivative for the first variable of p, and psoln, second derivative for the first 
variable, and psol2 and PS0I22 for the second variable) as functions such that their value is the limit 
of p( a: + z ^)~p( :r ) wnen Ax — > 0. As the ACSL annotations are only first order, these definitions are 
quite cumbersome: each derivative needs 5 lines to be defined. 

We also put as axioms the fact that the solution has the expected properties |T}f5]). The 
last property needed on the exact solution is its regularity. We require it to be near its Taylor 
approximations of degrees 3 and 4 on the whole interval [x m i n , £ max ]. For instance, the following 
annotation states the property for degree 3. 

/*@ axiom pso /_su ff.regu I a r_3 : 
@ < alpha. 3 && < C.3 && 

IS \forall real x; \ forall real t; \forall real dx; \ forall real dt; 

@ <= x <= 1 =^> \sqrt(dx * dx + dt * dt) <= alpha. 3 

@ \abs( psol (x + dx , t + dt) - psoLTaylor.3 (x , t, dx , dt))<= 

@ C.3 * \abs(\pow(\sqrt (dx * dx + dt * dt), 3)); 

@*/ 

The second axiomatic corresponds to the properties and loop invariant needed by the program. 
For example, we require the matrix to be separated: it means that a line of the matrix should 
not mix with another line (or a modification could alter another point of the matrix). We also 
state the existence of the loop invariant analytic_error that is needed for applying the results of 
Section O 

The initializations functions are specified, but not stated. This corresponds firstly to the 
function array2d_alloc that initializes the matrix and p_zero that produces an approximation of the 
Po function. Our program verification is modular: our proofs are generic with respect to po and 
its implementation. 

The preconditions of the main functions are the following ones: 

• ^max and & max must be greater than one, but small enough so that i max ~\~ 1 and fc max + 1 do 
not overflow; 

• the grid sizes Ax must fulfill some mathematical conditions that are required for the con- 
vergence of the scheme; 

• the floating-point values computed for the grid sizes must be near their mathematical values; 

• to prevent exceptional behavior in the computation of a, the time discretization step must 
be greater than 2 -1000 and must be greater than 2 -500 . 

There are two postconditions, corresponding to the method and round-off errors. See Sec- 
tions [23] an d GEH for more details. 

4.3 Automation and Manual Proofs 

This section is devoted to formal specifications and proofs corresponding to the bounds proved in 
Section [3J We give some key points of the automated proofs. 

Big O. In section lB.l.ll we present two interpretations of the big O notation. Usual mathematical 
pen-and-paper proofs switch from one interpretation to the other depending on which one is the 
most adapted, without noticing that they may not be equivalent. The formal development was 
helpful in bringing into light the erroneous reasoning hidden by the usage of big O notations. We 
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introduced the notion of uniform big O in [13] in the context of an infinite string. In the present 
paper, we consider the case of the finite string, hence for compactness reasons, both notions are in 
fact equivalent. However, we still use the more general uniform big O notion to share most of the 
proof developments between the finite and the infinite cases. Regarding automation, a decision 
procedure has been developed in [4]; unfortunately, those results were not applicable since we 
needed a more powerful big O. 

Differential operators. As long as we were studying only the method error, we did not have 
to define the differential operators nor assume anything about them (T3]. Their only properties 
appeared through their usage: function p is a solution of the partial differential equation and 
it is sufficiently regular. This is no longer possible for the annotated C program. Indeed, due 
to the underlying logic, the annotations have to define p as a solution of the PDE by using 
first-order formulas stating differentiability, instead of second-order formulas involving differential 
operators. Since the formalization of Taylor approximations has been left unchanged, the natural 
way to relate the C annotations with the Coq development is therefore to define the operators as 
actual differential operators. Note that this has forced us to introduce a small axiom. Indeed, our 
definition of Taylor approximation depends on differential operators that are total functions, while 
Coq's standard library defines only partial operators. So we have assumed the existence of some 
total operators that are equal to the partial ones whenever applied to differentiable functions. 
The axiom states absolutely nothing about the result of these operators for nondifferentiable 
functions, so no inconsistencies are introduced this way. This is just a specific instance of Hilbert 
e operator [57 , which does not make the logic inconsistent [41) . 

Method error. The Coq proof of the method error is about 5000-line long. About half of it 
is dedicated to the wave equation and the other half is re-usable (definition and properties of the 
dot product, the big O, Taylor expansions. . . ). We formally proved without any axiom that the 
numerical scheme is convergent of order 2, which is the known mathematical result. An interesting 
aspect of the formal proof in Coq is that we were able to extract the constants a and C appearing 
in the big O for the convergence result in order to obtain their precise values. The recursive 
extraction was fully automatic after making explicit some inlining. The mathematical expressions 
are given in Section [3.31 

Round-off errors. Except for Lemma [21 all the proofs described in section 13.21 have been 
done and machined-checked using Coq. In particular, the proof of the bound on 6f was done 
automatically by calling Gappa from Coq. Lemma [5] is a technical detail compared to the rest 
of our work, that is not worth the immense Coq development it would require: keen results on 
integrals but also definitions and results about the Legendre, Laguerre, Chebychev, and Jacobi 
polynomials. 

The program proof. Given the program code, the Why tool generates 149 verification condi- 
tions that have to be proved. While possible, proving all of them in Coq would be rather tedious. 
Moreover, it would lead to a rather fragile construct: any later modification to the code, however 
small it is, would cause different proof obligations to be generated, which would then require ad- 
ditional human interaction to adapt the Coq proofs. We prefer to have automated provers (SMT 
solvers and Gappa) discharge as many of them as possible, so that only the most intricate ones are 
left to be proven in Coq. The following table shows how many goals are discharged automatically 
and how many are left to the userF^I 

12 Note that verification conditions might be discharged by one or several automated provers. 
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Prover 


Proved Behavior VC 


Proved Safety VC 


Total 


Alt-Ergo 


18 


80 


98 


CVC3 


18 


89 


107 


Gappa 


2 


20 


22 


Z3 


21 


63 


84 


Automatically proved 


23 


94 


117 


Coq 


21 


11 


32 


Total 


44 


105 


149 



On safety goals (matrix access, loop variant decrease, overflow), automatic provers are helpful: 
they prove about 90 % of the goals. On behavior goals (loop invariant, assertion, postcondition), 
automatic provers succeed for half of the goals. As our loop invariant involves an uninterpreted 
predicate, the automatic provers cannot prove all the behavior goals (they would have been too 
complicated anyway). That is why we resort to an interactive higher-order theorem prover, namely 
Coq. 

Coq proofs are split into two sets: first, the mathematical proof of convergence and second, 
the proofs of bounded round-off errors and absence of runtime errors. Appendix [C] displays the 
layout of the Coq formalization. 

The following tabular gives the compilation times of the Coq files on a 3-GHz dual core machine. 



Type of proofs 


Nb spec lines 


Nb lines 


Compilation time 


Convergence 


991 


5 275 


42 s 


Round-off + runtime errors 


7 737 


13175 


32 min 



Note that most theorem statements regarding round-off and runtime errors are automatically 
generated (7321 lines out of 7737) by the Frama-C/ Jessie/Why framework. 

The compilation time may seem prohibitive; it is mainly due to the size of the theorems and to 
calls to the omega decision procedure for Presburger arithmetic. The difficulty does not lie in the 
arithmetic statement itself, but rather in a large number of useless hypotheses. In order to reduce 
the compilation time, we could manually massage the hypotheses to speed up the procedure, but 
this would defeat the point of using an automatic tactic. 

5 Conclusion 

In the end, having formally verified the C program means that all of the proof obligations gen- 
erated by Frama-C/ Jessie/ Why have been proved, either by automated tools or by Coq formal 
proofs. These formal proofs depend on some axioms specific to this work: the fact about Jacobi 
polynomials, the existence and regularity of a solution to the EDP, and the existence of differen- 
tial operators. The last two have been tackled by subsequent works, which means that the only 
remaining Coq axiom is the one about Jacobi polynomials. 

We succeeded in verifying a C program that implements a numerical scheme for the resolution 
of the one-dimensional acoustic wave equation. This is comprised of three sets of proofs. First we 
formalized the wave equation and proved the convergence of a scheme for its numerical resolution. 
Second we proved that the C program behaves safely: no out-of-bound array accesses and no 
overflow during floating-point computations. Third we proved that the round-off errors are not 
causing the numerical results to go astray. This is the first verification of this kind of program 
that covers all its aspects, both mathematics and implementation. 

This work shows a tight synergy between researchers from applied mathematics and logic. 
Three domains are intertwined here: applied mathematics for an initial proof that was enriched 
and detailed upon request, computer arithmetic for smart bounds on round-off errors, and formal 
methods for machine-checking them. This may be the reason why such proofs never appeared 
before, as this kind of collaboration is uncommon. 
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Each proof came with its own hurdles. For ensuring the correct behavior of the program, 
the most tedious point was to prove that setting a result value did not cause other values to 
change, that is, that all the lines of the matrix are properly separated. In particular, verifying the 
loop invariant requires checking that, except for the new value, the properties of the memory are 
preserved. An unexpectedly tedious part was to check that the program actually complies with 
our mathematical model for the numerical scheme. 

Another difficulty lies in the mathematical proof itself. We based our work on proofs found in 
books, courses, and articles. It appears that pen-and-paper proofs are sometimes sketchy: they 
may be fuzzy about the needed hypotheses, especially when switching quantifiers. We have also 
learned that filling the gaps may cause us to go back to the drawing board and to change the basic 
blocks of our formalization to make them more generic (e.g. devising a big O that needs to be 
uniform and also generic with respect to a property P). 

An unexpected side effect of having performed this formal verification in Coq is our ability to 
automatically extract the constants hidden inside the proofs. That way, we are able to explicitly 
bound the total error rather than just having the usual 0(Ax 2 + At 2 ) bound. In particular, we 
can compare the magnitudes of the method error and round-off error and then decide how to scale 
the discretization grid. 

Coq could have offered us more: it would have been possible to describe and prove the algorithm 
directly in Coq. The same formalism would have been used all the way long, but we were more 
interested in proving a real-life program in a real-life language. This has shown us the difficulties 
lying in the memory handling for matrices. In the end, we have a C code with readable annotations 
instead of a Coq theorem and that seems more convincing to applied mathematicians. 

For this exploratory work, we considered the simple three-point scheme for the one-dimen- 
sional wave equation. Further works involve scaling to higher-dimension. The one-dimensional 
case showed us that summations and finite support functions play a much more important role in 
the development than we first expected. We are therefore moving to the SSReflect interface and 
libraries for Coq [5], so as to simplify the manipulations of these objects in the higher-dimensional 
case. 

This example also exhibits a major cancellation of rounding errors and it would be interesting 
to see under which conditions numerical schemes behave so well. 

Another perspective is to generalize our approach to other higher-order numerical schemes for 
the same equation, and to other PDEs. However, the proofs of Section 13.11 are entangled with 
particulars of the presented problem, and would therefore have to be redone for other problems. 
So a more fruitful approach would be to prove once and for all the Lax equivalence theorem that 
states that consistency implies the equivalence between convergence and stability. This would 
considerably reduce the amount of work needed for tackling other schemes and equations. 
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A Source Code 



/*@ axiomatic dirichlet. maths { 
@ 

@ logic real c; 

@ logic real pO(real x); 

@ logic real psol(real x, real t); 

@ axiom epos : < c; 

@ logic real psoLl(real x, real t ) ; 

@ axiom psol.l.def : 

@ \forall real x; \forall real t; 

@ \forall real eps; \exists real C; 0< C &&\forall real dx; 
@ \abs(dx) < C=> 

@ \abs(( psol (x + dx , t) — psol (x, t)) / dx — psoLl (x, t))< eps ; 

@ logic real psol. 11( real x, real t); 

@ axiom psol .11 .def : 

@ \forall real x; \forall real t; 

@ \forall real eps; \exists real C; 0< C&&\forall real dx; 
@ \abs(dx) < C => 

@ \abs(( psoLl (x + dx, t) - psoLl(x, t)) / dx - psoLl 1 (x , t))<eps; 

@ logic real psoL2( real x, real t); 

@ axiom psoL2.def : 

@ \forall real x; \forall real t; 

@ \forall real eps; \exists real C; 0< C&&.\forall real dt ; 
@ \abs(dt) < C => 

@ \abs((psol(x, t + dt) - psol(x, t)) / dt - psoL2(x, t)) < eps; 

@ logic real psoL22( real x, real t); 

IS axiom psol.22.def : 

@ \forall real x; \forall real t; 

@ \forall real eps; \exists real C; 0< C && \ forall real dt ; 
@ \abs(dt) < C=> 

@ \abs(( psoL2(x, t + dt) - psoL2(x, t)) / dt - psoL22(x, t)) < eps; 

@ axiom wave.eq.O: \ forall real x; <= x <= 1 psol(x, 0) = pO(x); 

<S axiom wave.eq.l: \ forall real x; <= x <= 1 psol.2(x, 0) = 0; 

@ axiom wave.eq.2 : 

@ \forall real x; \forall real t; 

@ <= x <= 1 psoL22(x, t) - c * c * psoLll(x, t) = 0; 

@ axiom wave.eq.dirichlet.l : \ forall real t; psol(0, t) = 0; 
@ axiom wave.eq.dirichlet.2 : \ forall real t; psol(l, t) = 0; 

@ logic real psol _T 'ay lor .3 ( real x, real t, real dx , real dt); 
@ logic real psol .T aylor.4 ( real x, real t, real dx , real dt); 

@ logic real alpha. 3; logic real C.3 ; 
@ logic real alpha. 4; logic real C.4; 

@ axiom pso l.su ff.regu I a r.3 : 
@ < alpha. 3 && < C.3 && 

@ \forall real x; \forall real t; \forall real dx; \forall real dt ; 

@ <= x <= 1 =^> \sqrt(dx * dx + dt * dt) <= alpha. 3 ^> 

@ \abs(psol(x + dx, t + dt) - psol. Taylor. 3 (x , t, dx, dt))<= 

<S C.3 * \abs(\pow(\sqrt (dx * dx + dt * dt), 3)); 

IS axiom pso l.su ff.regu I a r. 4 : 
@ < alpha.4 && < C.4 && 

@ \forall real x; \forall real t; \forall real dx; \forall real dt ; 

@ <= x <= 1 =^> \sqrt(dx * dx + dt * dt) <= alpha.4 ^> 

@ \abs(psol(x + dx, t + dt) - psol.T aylor.4 (x , t, dx, dt))<= 

@ C.4 * \abs(\pow(\sqrt (dx * dx + dt * dt), 4)); 
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@ axiom pso/./e : 

@ \forall real x; \forall real t; 

@ <= x <= 1 => <= t => \abs(psol (x, t)) <= 1; 

@ logic real Tjnax; 

@ axiom T.max.pos : < 7"_max; 

@ logic real C.conv; logic real alpha.conv ; 
@ lemma alpha. conv. pos : 0< alpha.conv; 
@ 

@ } */ 



/*<§ axiomatic d i ri ch I et.prog { 
@ 

@ predicate analytic. error{L} 

IS (double **p, integer ni , integer i, integer k, double a, double dt) 

@ reads p[. .][. .] ; 

@ 

@ lemma analytic .error _/e{L} : 

@ \forall double **p; \forall integer ni ; \forall integer i; 

@ \forall integer nk; \ forall integer k; 

@ \forall double a; \ forall double dt ; 

@ < ni <= / <= ni => <= k =^> 

@ < \exact(dt) =^> 

@ a na lytic. error (p , ni , i , k, a, dt) =^> 

@ \sqrt(l. / (ni * ni) + \exact(dt) * \exact(dt)) < alpha.conv =^> 

@ k <— nk =^> nk <= 7598581 =^> nk * \exact(dt) <= T.max =^> 

@ \exact(dt) * ni * c <= 1 - 0xl.p-50^> 

IS \forall integer il; \ forall integer kl ; 

@ <= il <= ni <= kl < k 

@ \abs(p[il ][kl]) <= 2; 

@ 

@ predicate separated. matrix{L}(double **p, integer leni) = 

(9 \forall integer i; \forall integer j; 

@ <— i < leni =^> <= j < leni =^> i != j =^> 

@ \base.addr(p[ i ]) != \base_addr (p[j ] ) ; 

@ 

<S logic real sqr.norm.dx.conv.err{L} 

IS (double **p, real dx , real dt , integer ni , integer i , integer k) 

@ reads p[. .][. .] ; 

<S logic real sqr( real x) = x * x; 

@ lemma sqr.norm.dx.conv.err.O{L} : 

@ \forall double **p; \ forall real dx; \ forall real dt; 

@ \forall integer ni; \ forall integer k; 

@ sqr.norm.dx.conv.err (p , dx, dt , ni , 0, k) = 0; 

@ lemma sqr. norm. dx.conv. err. succ{L} : 

@ \forall double **p; \ forall real dx; \ forall real dt ; 

@ \forall integer ni; \forall integer i; \forall integer k; 

@ 0<= i => 

@ sqr.norm.dx.conv.err (p , dx, dt , ni , i + 1, k) = 

@ sqr.norm.dx.conv.err (p , dx , dt , ni , i, k) + 

@ dx * sqr( psol (1. * i / ni , k * dt) — \exact ( p [ i ] [ k] ) ) ; 

<S logic real norm.dx.conv.err{L} 

@ (double **p, real dt , integer ni , integer k) = 

@ \sqrt ( sqr.norm.dx.conv.err (p , 1. / ni , dt , ni , ni , k)); 

@ 

@ } */ 



/*@ requires leni >= 1 && lenj >= 1; 
@ ensures 

@ \valid.range(\result , 0, leni - 1) &&. 
@ (\forall integer i; <— i < leni => 
@ \valid. range (\ result [ i ] , 0, lenj — 1)) 
@ separated. matrix (\ result , leni); 
@ */ 
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double **a r ray2d_a I loc ( int leni, int lenj); 



/*@ requires ( I != 0); 
@ ensures 

@ \round_error(\ resu It ) <= 14 * 0x1. p— 52, 
@ \exact(\ result ) = p0(\ exact (x) ); 
@ */ 

double p.zero (double xs , double I, double x); 



/*@ requires 

@ ni >= 2 &&. nk>= 2 && I != && 

@ dt > 0. && \exact(dt) > 0. && 

@ \exact(v) = c && \exact(v) = v && 

@ 0x1. p- 1000 <= \exact(dt) && 

@ ni <= 2147483646 && nk <= 7598581 && 

@ nk * \ exact (dt) <= Tjnax && 

@ \abs(\exact(dt) - dt) / dt <= Oxl.p-51 && 

@ 0x1 .p-500 <— \exact(dt) * ni * c <= 1 - 0x1 .p-50 &&. 

@ \sqrt(l. / (ni * ni) +\exact(dt) * \exact(dt)) < alpha.conv; 

@ 

@ ensures 

@ \forall integer i; \forall integer k; 

@ <= / <= ni =^> <= k <= nk =^> 

@ \round.error(\result [i ][k]) <= 78. / 2 * Oxl.p-52 * (k + 1) * (k + 2); 
@ 

@ ensures 

@ \forall integer k; <= k <— nk =^> 

IS norm_dx_conv_err (\ resu It , \exact(dt), ni , k) <= 

@ C_conv * (1. / (ni * ni) + \exact(dt) * \exact(dt)); 

@ */ 

double **forwa rd.prop ( int ni , int nk, double dt , double v, 
double xs , double I ) { 

/* Output variable. */ 
double **p; 

/* Local variables . */ 
int i , k ; 

double al , a , dp , dx ; 

dx = l./ni ; 
/*@ assert 

@ dx > 0. && dx <= 0.5 && 

@ \abs(\exact(dx) - dx) / dx <= Oxl.p-53; 

@ */ 

/* Compute the constant coefficient of the stiffness matrix. */ 
al = dt/dx*v; 
a = al*al ; 
/*(§ assert 

@ 0<= a <= 1 && 

@ < \exact(a) <= 1 && 

@ \round-error(a) <= 0xl.p-49; 

@ */ 

/* Allocate space— time variable for the discrete solution. */ 
p = a r ray 2d_a I loc ( ni+1, nk+1); 

/* First initial condition and boundary conditions . */ 
/* Left boundary . */ 
p[0][0] = 0.; 

/* Time iteration —1 = space loop. */ 
/*@ loop invariant 
@ 1<= i <= ni Mi 

@ a na lytic. error (p , ni , i — 1, 0, a, dt); 
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@ loop variant ni — i; */ 
for (i=l; i<ni ; i++) { 

p[i][0] = p_zero(xs, I, i*dx); 

} 

/* Right boundary. */ 
p[ni][0] = 0.; 

/*@ assert a na lytic. e rro r (p , ni , ni , 0, a, dt); */ 

/* Second initial condition (with p.one—0) and boundary conditions . */ 
/* Left boundary. */ 
p[0][l] = 0.; 

/* Time iteration = space loop. */ 
/*@ loop invariant 
@ 1<= i <= ni && 

@ a n a lytic, error (p , ni , i — 1, 1, a, dt); 
@ loop variant ni — i; */ 
for ( i=l; i<ni ; i++) { 

/*© assert \abs (p[ i - 1] [0] ) <= 2; */ 
/*<§ assert \abs(p[ i ] [0] ) <= 2; */ 
/*<§ assert \abs (p [ i + 1][0] ) <= 2; */ 
dp = p[i+l][0] - 2.*p[i][0] + P [i -1][0]; 
p[ i ] [1] = p[i][0] + 0.5*a*dp; 

} 

/* Right boundary. */ 
P[ni][l] = 0.; 

/*(§ assert a na lytic. e rro r (p , ni , ni , 1, a, dt); */ 

/* Evolution problem and boundary conditions . */ 
/* Propagation — time loop. */ 
/*@ loop invariant 
@ 1<= k<= nk Mi 

@ a n a lytic. er ror (p , ni , ni , k, a, dt); 
@ loop variant nk — k; */ 
for (k=l; k<nk; k++) { 
/* Left boundary . */ 
p[0][k+l] = 0.; 

/* Time iteration k — space loop. */ 
/*<S loop invariant 
@ 1<= i <= ni && 

IS a n a I yt ic. error (p , ni , i — 1, k + 1 , a, dt); 
IS loop variant ni — i; */ 
for ( i=l; i<ni ; i++) { 

/*@ assert \abs(p[ i - l][k] ) <= 2; */ 

/*@ assert \abs(p [ i ] [ k] ) <= 2; */ 

/*@ assert \abs(p [ i +1] [k] ) <= 2; */ 

/*@ assert \abs(p [ i ] [ k-1]) <= 2; */ 

dp = p[i+l][k] - 2.*p[i][k] + p[i-l][k]; 

p[i][k+l] = 2.*p[i][k] - p[i][k-l] + a*dp; 

} 

/* Right boundary . */ 
p[ni][k+l] = 0.; 

/*@ assert a n a lyti c.er ror (p , ni , ni , k + 1, a, dt); */ 

} 

return p; 
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B Screenshot 

This is a screenshot of gWhy: we have the list of all the verification conditions and if they are 
proved by the various automatic tools. 
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p = an-ay2d_alioc(ril+l, nk+1) ; 

/" initial conditions (includes boundary conditions) ■/ 

p[3][0]=S.: 

/•(a loop invariant 1 <:= i ni SS analytic_errD r(p, n.i,i -1 , ,a, dt) ; 

8 lobp variant ni-i; */ 
for (±=l; i-mi; 1++) ( 

p[i][Q] = p_zerb(xs, I, fflgg) : 

.}■ 

p[ni][B] =3.; 



pl.B]|l] = 8.; 

/"@ loop invariant 1 <= i <= ni SS analytic_errbr[p,ni.i-l, 1 ,a.dt) ; 
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C Dependency Graph 

In the following graph, the ellipse nodes are Coq files formalizing the wave equation and the 
convergence of its numerical scheme. The octagon nodes are Coq files that deal with proof obliga- 
tions generated from the dirichlet . c program file, that is, propagation of round-off errors and 
error-free execution. Arrows represent dependencies between the Coq files. 
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